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Abstract. We provide an exact expression for the multi-variate joint probability distribution function 
of non-Gaussian fields primordially arising from local transformations of a Gaussian field. This kind 
of non-Gaussianity is generated in many models of inflation. We apply our expression to the non- 
Gaussianity estimation from Cosmic Microwave Background maps and the halo mass function where 
we obtain analytical expressions. We also provide analytic approximations and their range of validity. 
For the Cosmic Microwave Background we give a fast way to compute the PDF which is valid up to 
7(7 for /nl values (both true and sampled) not ruled out by current observations, which consists of 
expressing the PDF as a combination of bispectrum and trispectrum of the temperature maps. The 
resulting expression is valid for any kind of non-Gaussianity and is not limited to the local type. The 
above results may serve as the basis for a fully Bayesian analysis of the non-Gaussianity parameter. 
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1 Introduction 

One of the most interesting predictions of slow-roll single field inflation is that it should generate 
primordial perturbations that are nearly Gaussian distributed, with deviations from non-Gaussianity 
present at the level of the value of the slow-roll parameters [1-10]. Current constraints on the value 
of the slow-roll parameters e.g., [11] indicate that non-Gaussianity from single field slow-roll inflation 
will not be measured from the sky, even if one could access all modes in the sky with futuristic 21cm 
experiments [12, 13]. However, any deviation from the simplest slow roll (with standard Bunch- 
Davis vacuum and canonical kinetic term) implies non-Gaussianity, possibly at a detectable level. In 
particular many multi-field models of inflation will produce detectable non-Gaussianity that current 
or future surveys could measure: see the review by Refs.[10, 14] for a detailed explanation of the 
physics behind it. Although in principle the primordial non-Gaussian signal should be present both 
in the Cosmic Microwave background and in the late large-scale structure of the Universe, most of 
the effort has so far been devoted to the cosmic microwave background where the gravitationally- 
induced non-Gaussianity is small e.g. ,[5, 15-17] and refs therein. However, if the non-Gaussian signal 
is at a clearly detectable level, current estimators (which are exact in the Gaussian limit) become 
sub-optimal and in particular can underestimate the size of confidence regions [18, 19]. Therefore it is 
important to understand exquisitely the distribution of errors of any estimator, not only its variance 
(see [31]): the full probability distribution function (PDF) is needed. 
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Further, a possible detection of non-Gaussianity from the sky would be a direct probe of the 
properties of de Sitter space, as a study of the different correlators of non-Gaussianity would amount 
to performing transition experiments on de Sitter; something that will never be accessible from high- 
energy experiments in the imaginable future. This could open a new window to understand the 
properties of gravity in de Sitter. One could, for example, answer the question if fluctuations could 
be treated semi-classically as we do now. 

Work so far has concentrated in finding the one-point PDF for primordial non-Gaussian initial 
conditions as this has a direct application on the calculation of the halo mass function and rare event 
abundance. This area has seen an exponential expansion starting with the works of [20, 21]. However 
for applications to e.g., the Cosmic Microwave Background (CMB) an expression for the multi-variate 
joint PDF is needed. 

It is common practice to describe non-Gaussianity as a Taylor expansion of the potential of the 
form $ = (f> + /nl(0 2 — (4> 2 )) [5, 7], where tj> is a Gaussian random field. This can be extended to 
the cubic power if the quadratic term is sub-dominant or zero, in which case the free parameter is 
<7nl- This is the so-called local type non-Gaussianity because it is a local transformation in real space. 
More general kinds of non-Gaussianity beyond the local type can be described by their bispectrum 
(although this description is not exhaustive as fields with the same bispectrum and power spectrum 
can have different higher-order correlations) e.g., [14] and references therein. 

In addition to the above, it is the multi-variate joint PDF of the non-Gaussian primordial field 
which is predicted by theory and which has important clues about the physics of the perturbations of 
de Sitter space; therefore having an exact estimate of it goes beyond the simple experimental merit 
of constraining one parameter, as it may yield clues on fundamental physics. 

Previous work on estimating the PDF for the CMB bispectrum or the non-Gaussianity parameter 
/nl has focussed on performing Monte Carlo simulations, but these are extremely expensive and may 
not fully sample the tails of the distribution (i.e. beyond 3 a) [19, 22-24]. Further complications arise 
form the fact that the PDF will depend at some level on the value of /nl itself. 

In this paper we circumvent these limitations and provide an exact solution based on analytical 
methods. In particular, we provide an exact analytical formula for the one-point PDF in Eq. (4.12) 
that can be used for any computation of the mass function of dark matter halos. 

Further, an exact formula for the joint multi-variate PDF is provided in Eq. (4.20) and (4.26), 
which can be numerically integrated. For the CMB we provide a very accurate formula for the PDF 
(up to la fluctuations in the field for /nl values not ruled out by present data) expressed in terms 
of the observed bispectra and trispectra. This serves as the basis of a fully Bayesian analysis of /nl; 
therefore, it can provide a very accurate way to compute PDF for the upcoming Planck data. 

The paper is organised in a pedagogical way, first, as a warm up, solving the straightforward 
Gaussian case in § 3 and then the non-Gaussian one in § 4, where we also discuss comparison with 
previous estimators and present an analytic approximation which is valid up to la fluctuations in 
the field for /nl values not ruled out by present data and applies to non-Gaussianities beyond the 
local type. We conclude in § 5 with an application of the formalism to the mass function of dark 
matter halos and the CMB. The reader interested only in practical applications can find our relevant 
formulas in Eqs. (4.26, 4.28, 5.2). 

2 Set-up 

Let us define a field A, which is the observable field and can be the temperature of the CMB or the 
a™ or the matter overdensity field etc... The value of the field at any spatial point i is related to the 
underlying potential $ via 



where in the second equality we have introduced a short hand notation. For many applications 
A4(x,y) = A4(|x — y|) = A4(r). For example if ^4(x) is the (linearly evolved) density field <5(x) 




(2.1) 
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smoothed on a given radius R then we could express this in Fourier space: 



M(k) = / d 3 ye-^X(|y|) = W R (k)T(k)g(k) (2.2) 



where W^(k) is the Fourier transform of the assumed filter (typically a top hat for the mass function), 
T(k) is the transfer function and g(k) — —2/3(k/Ho) 2 Q~^ . The fact that Fourier quantities are 
complex should not be a problem, one works with a complex field rather than a real one. If the 
Ai denote the spherical harmonic coefficients of the CMB temperature field, ai m , then the equation 
becomes: 

a lm = / ^-^) 9l {k)Y; m {k) (2.3) 



(2tt) 3 

where gi(k) denotes the radiation transfer function an thus we have effectively an Ai lm (k) and $(k) 
I d 3 xexp[— ik ■ x]$(x). If we observe a Fourier space quantity then one can relate it to the real space 
$ as 

A 3 = J d 3 yexpHk-y)$(y) (2.4) 

i.e. -M(k, y) = exp(— ik • y), and j labels the wavevector k. 

Several physically-motivated models for primordial non-Gaussianity are expressed as local trans- 
formations of an underlying Gaussian field. For example, the non-Gaussianity of the so-called local 
type [5, 7, 25] is given by 

$ = <t> + /nl(0 2 - (0 2 » = (j> + /nl(0 2 - 4) , (2.5) 

where is a Gaussian random field and /nl is the non-Gaussianity parameter which is assumed to be 
small. Here by small we mean that this expression is a truncation to linear order in /nl of a physical 
model that includes e.g., the self coupling of the field during inflation. In the following we would 
like not to make any further approximations if possible, thus deriving whenever possible expressions 
that are valid even if I/nlc^I is not 1. Other local models include the so called <?nl parameter 
[21, 26, 27]: 

$ = ^ + 5 niV- 3v*t£) = + Sni> 3 ( 2 -6) 

where ip, <j> are Gaussian random fields. 

In both cases the observable field A can be written as a sum of a Gaussian term A G , corresponding 
to <[> plus a non-Gaussian correction A NG , corresponding to the term cx <f> 2 or cx (j) 3 . 

The full information about the Gaussian field <f> is given by the Gaussian generating functional 
V[4>] (see e.g., Eq 11 of Ref. [20]) which is the basis of several calculations in this paper. For example, 
the joint probability of A\, ...A n is given by: 

/n 
mV(4,)l[S D [A i -(MM<f>))] (2-7) 
t=i 

where we have explicitly written that the field $ is a function of the Gaussian random field <p. Here, 

exp[-l/2(0,X o ,0)] 
/W $ D[ct>]eM-W,K ,<P)] { -> 

and we have used the shorthand notation 

J d 3 2/d 3 z0(yMy-z)0(z) = (0,if», (2.9) 

and with Kq is defined by the relation to the 20point correlation function of (j), 

J d 3 yK (\ X - y|)^(|y - z|) = «J(|x - z|). (2.10) 
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Index 


range 


meaning 


a:g 
v.z 

v or greek 


I'M 
l:n 

n + 1M 


number of elements of the full discretized space of <f> 
number of elements of the discretized A array 
for which we want the joint probability 
the remaining elements of the full discretized space 



Table 1. Summary of the index convention. The number of elements of the observable field A is L. Of course 
n < L, L < M , and n < M . Note that for M to be invertible a necessary but not sufficient condition is that 
the maximum possible size of A to be M i.e. M — L. 



Then to compute, for example, the probability distribution of a product of n A's we use 

P(B = A l ...A n ) = j dA 1 dA 2 ...dA n P(A 1 ,...,A n )5 D (B- Ai...A n ). (2.11) 

Here n is 3 for the three point function (in Fourier space it is related to the bispectrum), n = 2 for 
the correlation function (in Fourier space it is related to the power spectrum) of A and n = 1 for the 
(one point) PDF of A. 

In these types of integrals, the Fourier representation of the Dirac delta function involves highly 
oscillatory functions and complicates the integrals. On the other hand, observations are often dis- 
cretized/pixelized, so one possibility is to discretize these integrals to simplify their analytic calcula- 
tion, then if needed take the continuous limit at the end. Let us pixelize our field A (say A is known 
at each pixel of a map) and we have L pixels, let us also pixelize the $ field for which we have in total 
M pixels (L < M). As we will see this simplifies a lot the spatial integrals and the integrals in 2? [</>]. 
The integrals in A{ are however continuous integrals over all possible values that the field can take at 
position i. 

Let us define the index convention we will use. L denotes the size of the observable array A, of 
which we want to compute the n point distribution n < L, M denotes the size of the pixelized array 
4> (or $). Note that n < L < M. Let us call M — n = N. The i indices (or latin indices of the last 
part of the alphabet: i-z) run from 1 to n and v (or greek indices) runs from n + 1 to M. From now 
on latin indices of the first part of the alphabet (a-h) runs from 1 to M. Note that if A and $ have 
the same dimensions then M. becomes a square matrix (M x M) and everything below is relatively 
simple. However this does not need to be, as it is clear in the case of the a; m where the i index runs 
over the -discrete- I, m but the index j represent the integral in k which is in principle a continuous 
variable (but is discretized in most practical applications). 



Discretizing we obtain for Equations such as Eq. (2.10) and Eq. (2.1): 
/ d 3 yM(xj, y)$(y) — > J2 a -^ja^a etc. Also we can express D[cj>] = d<j>id(j> v . 
The Gaussian integral in the denominator of Eq. (2.8) gives 



X nr and 



(2tt 



dctKo 



(2.12) 



The Dirac delta function becomes S(A{ — J^j ^ij4>j)- 

Finally, let us report here a result for Gaussian integrals with we will use repeatedly below. The 
Gaussian integral is 



dip exp 



1 



ip T Kip + B T ip 



(27r) m 
dot K 



exp 



B T K *B 



(2.13) 



with K a square matrix m x m and B and if are vectors of size m. 



3 Gaussian field $ 



To illustrate the approach and familiarize ourselves with some of the expressions, let us start in the 
simplest case where /nl, = 0, (7nl = 0. We begin by considering the case where M. is a square, 
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invertible matrix. In what follows, we denote with the over script arrow the vector composed of the 
value of the field in all the pixels, e.g., A = {A\, A n }. 



3.1 M invertible, n = L = M 

In this case we can do the following change of variables: 
Kq = (M^ 1 ) 7 ^ KqM^ 1 (matrix multiplication); Y\ a d(j) a 



det(M) 



-> M<p or <f> a = Y,ab M ab(t>b and 
We thus obtain: 



T(A) 



IdetKv f n 



(2tt) m J det(M) 

-l^2 A aA b K Qab 



H<t>jKoij 



I det K 

(2^F exp 



ah 



(3.1) 



Which is not surprising given that if the <fi field is a multivariate Gaussian the A field properties 
are also those of a multi-variate Gaussian albeit with a different covariance. 
Note that Kq is defined so that 



dVMIx - y|)Uly - z|) = <K|x - z|). 



(3.2) 



In other words, Kq is the equivalent quantity of Kq, not for the underlying field <f>, but for the 
observable field A. 

3.2 M. identity matrix or invertible, n < L, L = M 

Let us start with the simpler, special case where M. is the identity matrix, for example, when looking 
at the distribution of the product of n pixel values of an unsmoothed field. The values taken by the 
field 4> will be denoted by (p (here <p plays the role of A but we have changed notation to make clear 
that one is really seeing the same field). 
We start from: 



V ((pi,...,(p n ) 



(2 



(3.3) 



exp 



The integral over 



2 J 2,1/ [l.U 

can be done trivially exploiting the Dirac delta function yielding: 



V ((pi,...,(p n ) = 



I det K 

M 



(27T) 



(3.4) 



exp 



Here we recognize the Gaussian integral of Eq. (2.13) where in particular B v = ^\ (piK 0il , and K is 
the matrix made of the elements Kq„ v . 

Note that the matrix Kq can be split in block matrices as 



K 



K Ik 
k T K 



(3.5) 



where K is made of the K$ ij elements (it is a square sub matrix of size n x n) , K, is made by the Kq 
elements (it is a square matrix of size N x N), and k is an n x N matrix. 
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Finally we obtain that: 



p , - dct(g) 



1 - 

- ViVjQi. 



Here 



= ifoij - ^ K Qw K Qj ^IC 1 = K - 

and we have used the fact that 

detKo = det (&) = det(K - kAT V) det(AC) 



(3.6) 



(3.7) 



(3.8) 



implying det (if o)/ det (AT) is det(Q). 

Eq. (3.6), not surprisingly, is a Gaussian but with a different covariance from that of <p. It is not 
difficult to see, using the Woodbury formula 



(A + UCV)- 1 = A- 1 - A^UiC- 1 + VA~ 1 U)~ 1 A~ 1 
(where A, U, C, and V denote generic matrices) that 

[G }ij = [K Q ]ij — £ij 



(3.9) 



(3.10) 



as expected. 

In case M. is not the identity matrix but it is still an invertible matrix, as long as the array A 
has been pixelized in as many elements as </>, it is straightforward to compute P(A) even if n < M . 
One just does the substitution: fa — > J2 a -Mi a (pa etc. as above. In total one gets, 



P(A) 



'det(£0 



(27T) 



exp 



2 ^ 

ij=i 



AiAjQi 



(3.11) 



where Kq = (-M M KqM 1 (matrix multiplication) and Q is the equivalent of Q (see Eq. (3.7)) but 
made from Kq instead of Kq, {G~ 1 ]ij = [Kq 1 ]^ = £,Aij- That is we have a Gaussian with covariance 
given by that of the observed field. 

3.3 M non invertible, n < L, L < M 

In this case the observable field A is pixelized in L elements, but the dimension of the field <j> (and 
the field <&) is larger, for example M = L x H (i.e. there is one (or more) extra dimensions which gets 
pixelized to have H elements). 

It is easy to see that one can rewrite Eq. (2.7) considering that if <fi is Gaussian then its projection, 
e.g., to ai m , will also be Gaussian (M. is a linear operator). So we only have to reinterpret Eq. (3.1) 
in light of Eq. (2.1) to have <f> — A G and Kq to be the equivalent quantity for the Gaussian A G , in 
the case of the <x; m it will be connected to the Cg. In this case there is no need to invert M.. It is 
however instructive in view of the non-Gaussian case, to derive the same result going through the 
actual calculations. 

We should start from 



exp 



f I . V 



(3.12) 
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We can also split M in block matrices as: 



M = (X AO 



(3.13) 



where H is a n x n sub matrix and AT is n x N. We also split the vector <p — {(pi, <f>„} — {<p' , </>"}. We 
should start by re-writing the argument of the Dirac delta as: 



(3.14) 



now we do the following change of variables 

A - 



(3.15) 



H is invertible if the rank of A4 is maximal. Note that there is some freedom in splitting the sum in 
Eq.(3.14), but we argue that the rank of M. is always maximal if the information in A is not redundant 
(e.g., in the absence of noise if pixels are not counted more than once). This change of variables gives 
a multiplicative overall factor | det H | and which changes the argument of the Dirac delta to 



Ai-fa- (K- W) 2 



Ai 



(3.16) 



Now we can integrate over the d<f>i and eliminate the Dirac deltas giving the following substitution 

<j>i — ► A\ - 4>'i We get: 



V(A) 




, f ,detN| / d(j)"exp 



1 



A T KA + 2A T 1l(j)" + ct)" T Qcf)"^ 



de ^°|detH| J d^"eM-\{A,<P"} T H{A^"}] 



(2 



where 



H 



k n 

K T Q 



and 



TZ = k - KX^AT 



Q = }C + Af T ^- 1T K^- 1 AT-Af T ^- 1T k-k T ^- 1 Af 



(3.17) 
(3.18) 

(3.19) 
(3.20) 

(3.21) 



which is an TV x N matrix. 

We then recognize a Gaussian integral yielding 



V(A) = 



/det K 
(2tt)" 



det HI 



exp 



1 rp ~ 

2 



(3.22) 



where 

T = K -KQ- 1 ^ (3.23) 

and therefore, using Woodbury again, [J 7-1 ]^ = [H' 1 ]^, and T = H -17 T^T 1 . Of course, in this 
simple example we are using the general framework to compute the PDF of another Gaussian field, A, 
whose covariance is simply A4 T £,A4. It is straightforward using the Woodbury formula to demonstrate 
that we recover this result. For the Gaussian case there is therefore no need to invert Q or H. 

Equivalcntly, one could change the basis for <j> to tp so that the matrix A4 is M! = (N', 0). This 
being a linear operation means that ip is still Gaussian. Note that this means: take the rectangular 
matrix given by{H,A/"} and SVD decompose it as UT,V T (or UT,V* (conjugate transpose)). Then do 
the following change of variables: <p — > U<p — ip and therefore A4 — ► U~ 1 A4 = M! . The Jacobian 
of the transformation is therefore det U. 



-7- 



Thus the argument of the Dirac delta becomes Ai — Wip' and as before we make the change of 
variable A — > H'~ 1 A = A. This now simplifies the equations above as Q' = K! and TZ' = kf. (Of 
course the Jacobian of the transformation must be computed, alternatively one makes sure a posterior 
that the probability is suitably normalized). So one gets: 



V{A) = 



/det^ 0| |detH'| 



1 



: exp 



- 1 A T g'A 

2 



(3.24) 



(2tt)« I det U\ yjdeW. 

where §' = H' _1T ^'H' _1 and Q' = K' - VJ Q!~ x yJ T (where prime denotes the quantity in the ip 
basis. 

Of course Eqs. (3.22) and (3.24) are equivalent but depending on the context one or the other 
might be more suitable for practical implementation. Note that Q' = (A4' J Ai') where we use 
the prime to indicate the quantity for the ip field. Being £' = U£JJ T and Ai' = U~ 1 A4 we obtain 
M' T £M' = M T iM = U and therefore Q' = Q, [G'-% =Uij- 

To interpret Q e.g., in Eq. (3.22) we can readily write down the 1-point PDF (n = 1), where Q 
is a scalar, Q\\'- 



V{A X ) = -==exp 



2?r 



-\a\9ii 



(3.25) 



If A is the field in real space then Q(R) is l/erjj. of the field, R being the smoothing scale. If A 
is in Fourier space then Q(k) is 1/P{k). 

3.4 Adding Gaussian noise 

Having Gaussian noise in a map means that the observed field A is given by the superposition of 
the field A as before and an independent Gaussian field e, with covariance S e , A = A + e. Thus in 
Eqs. (2.7, 3.1, 3.12 ) etc. the Dirac delta gets substituted by a Gaussian with covariance S e . It is easy 
to see that if E7 1 = £ then the second line of Eq. (3.1) remains the same provided that Kq — > Q 



where Q = 



£- 



In doing so we have used the result for the Gaussian integral Eq. (2.13) 



and that £ ~ £{£ + K )- 1 £ = Kq 1 + £~ 

Therefore for all the subsequent cases above one must just make the substitution Q — > [G^ 1 + £] 
Note that we could have obtained this result more directly as follows: the PDF of the superposition 
of two independent processes (^4 and e) is the convolution of the individual PDFs. If the two PDFs 
are Gaussian the resulting PDF will be a Gaussian with inverse covariance given by the sum of the 
individual inverse covariances. 



4 Quadratic local non-Gaussianity, /nl 7^ 

In this case we note that /nl appears only in the argument of the Dirac delta function (see Eq. (2.7)). 
It is much easier to derive the final result in the case where M is the identity matrix. 

4.1 Ai identity matrix 

One possible way to proceed is to use the property of the Dirac delta: 

where r here denotes the roots of the function /. If now M is the identity matrix, we have to solve 
Ai + C— (f> — /nl^ 2 = where C is a constant to make the field <& to have zero average (C = /nl(0 2 ))- 
The roots of this are: 

r -i±yi + 4 /NL (A i + c) (42) 

2/nl 
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and 

Our Dirac delta becomes 



/' = 1 + V. 



NL<i 



n 



-l— v /l+4/ NL (A i +C) 
2/nl 



v /l + 4/ NL (A i + C) 



v/1 + 4/nl(^+C) 



(4.3) 



(4.4) 



In the case n = 1 there is no product, only the two deltas. Let us call 
Then Eq.(2.7) becomes: 



-1+ v /l + 4/ NL (A i + C) 
2/nl 

-l-y/l+4f NL {Ai+C) 



(4.5) 



2/, 



XL 



Metif 

(^) M V V1 + 4/nl(A: + C) 



n Vl + 4/ N L(A: + C)/n^^ -5? 



(4.6) 



x 53 ex P 

77/ 



1 53 /^o/^)^ - 5Z w 4 ')^^- 



For small /jvx the root with the minus sign will be exponentially suppressed. Fig. (1) shows the effect 
of neglecting the root on the 1 point PDF (which can be easily computed) for several choices for /nlc. 
For simplicity we have taken <fi to have unit variance. Where the distribution is not plotted is because 
it becomes imaginary (i.e. there is no solution). The figure shows that neglecting the root r = 2 is 
a very good approximation since deviations are small and non-zero only close to the limit where the 
PDF is not defined. Neglecting the r = 2 root we obtain: 



nlJ 



Met K Q 



n i 

11 J\ + 4/nt 



(2^) M V v/1 + 4/nl(A ! + C) 



(4.7) 



x exp 



If n < M we can recognize the Gaussian integral again and simplify the above equation: 



, (2<) n V V1+4/nl(^ + C) 
For n — M this is instead: 



exp 



(4.8) 



(4.9) 



7>(A/nl) 



/detifo 



n 1 

11 x/l + 4/ni 



(27r)» Y ^l + 4/ NL (A l + C) 



exp 



(4.10) 



Note that this is the multivariate distribution of the $ field (M. having been set to the identity). 
Of course, to obtain the same quantity for M not the identity matrix, one can then always write that 



P(A/nl) = / V[$]V($)5 U (A- / M*) 



(4.11) 
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Figure 1. Exact one point PDF for <E> (normalized to have unit variance) magenta on the left panel and 
the effect of neglecting the exponentially suppressed root for "small" /nl on the right panel. Several choices 
for /nlo> are shown. Where the distribution is not plotted is because it becomes imaginary (i.e. there is no 
solution). Neglecting the root r — 1 is a very good approximation since deviations are small and non-zero 
only close to the limit where the PDF is not defined. 



This then must be integrated numerically via Monte-Carlo methods. If M. is the identity matrix, 
the one point function can be obtained by setting, i,j = l as for the Dirac delta of Eq. (4.9). One 
then recognizes the Gaussian integral in <fi obtaining 



NL) 



det(0) 



exp 



exp 



(4.12) 

4.2 Approximations 

It is possible to further simply our expressions by expanding to first (or higher) order in /nl- For 
example expanding at first order the square root we obtain (1 + 2/nl J2i(Ai + C)) both in Eqs. (4.9) 
and (4.10), in the same spirit then fi(Ai) — > Aj + C — A|/nl- Figure (2) shows how different 
approximations perform in the case of the one-point PDF (where Ai is the identity matrix). We 
illustrate the performance of the different approximations by using the one point PDF and taking cf> 
to have unit variance (a — 1). Note that due to this normalization the real expansion parameter is 
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Figure 2. Left: the one point PDF (the underlying Gaussian field cj> is normalized to have unit variance 
(a — 1)). Solid line with diamonds: exact. Solid (black) line: keeping only terms proportional to /nlc both 
in the argument of the exponential and in the factor in front. Solid thin (black) line: First order expansion. 
Dotted line (purple): keeping terms proportional to /nlct 2 both in the argument of the exponential and in 
the factor in front. Short dashed line (blue):as above but expanding the terms oc /nlo" 2 in the argument of 
the exponential as exp(x) = (1 + x). Dot-dashed (light blue): expression where starting from an expansion 
that keeps terms proportional to /nlo" 2 both in the argument of the exponential and in the factor in front, 
only terms oc $ and $ 2 are kept in the argument of the exponential, the rest is expanded as exp(x) — 1 + x. 
Solid thick (red): Second order Edgeworth expansion. Long dashed (red): Expansion to second order in /nlc. 
Dot-dot-dashed line (green): approximation of Babich (2005) Right: % error in the approximation (same 
linestyle as in the left panel). See text for more details. 



/nlc- For clarity below we will use the symbol f n i to indicate the non-Gaussianity parameter for a 
field with unit variance, thus f„i = /nlc- 

The solid (black) line shows the first order expansion of both the square root and the argument 
of the exponential: 



P lst ($,/, i; ) = -L(l-2/„ ; <f)exp 

V Z7T 



-J$ 2 +/„ ; (^ 3 -*) 



(4.13) 



Note that the Babich (2005) [28] (B05 from now on) approximation of the PDF (see their Eq. 
20) would correspond to approximating (1 — 2/„;$) ~ exp(— 2/„;<i>). On the other hand the first 
order Edgeworth expansion (see e.g. [21] their Eq. (4.10)) corresponds to further expanding the non- 
Gaussian part in the exponential by exp[/„/($ 3 — $)] ~ 1 + /„;($ 3 — $) (thin solid (black) lines). The 
popular non-Gaussianity estimator [22, 28, 29] where the term ($ 3 — 3$) is considered stems from an 
expansion of the PDF (more on this below). 

In order to make the path integrals in what follows analytic it is worth to note that an integrand 
made by the product of a polynomial with an exponential of a quadratic expression is integrable 
analytically but an exponential of an expression of power higher than quadratic is not. However 
expanding the f n i§ 3 term in the exponent as (1 + f n i® 3 ) performs very badly as Fig. (2) shows 
(dot-dashed). 

Fig. (2) shows that the first order expansion in /„; does not provide a good approximation to 
the PDF, and indeed it is well known for example that a first order Edgeworth expansion can give 
negative values for the PDF (see discussion in [21]). Going to second order helps. Expanding to 
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Figure 3. Hybrid approach: the exponential part preserves the Gaussian structure as it has only terms oc 
and oc <f> 2 . Again cj> is normalized to have unit variance. 



second order both the square-root and the argument of the exponential yields the dotted line in the 
figure. If the second-order part of the exponential is also expanded as exp(x) = 1 + x we obtain the 
short dashed line. 

Expanding the full expression to second order in /„; (thus expressing the PDF as a Gaussian 
multiplied by a correction) we obtain something similar (but not quite identical) to the second-order 
Edgeworth expansion: 



>2nd 



($,/„/) = 



1 



exp 



1 r 
— 

2 



l + /„*($ 3 -3$) 



where the second-order Edgeworth expansion would be: 



-,2ndEW 



(*,/nj) = 



1 



V2ir 



exp 



f 2 

Jul 

2 



f 

1 + fnl($ 3 - 3$) + ($ 6 - 13$ 4 + 33$ 2 - 9) 



$o _ n$ 4 + 23$-" - 5 



(4.14) 



(4.15) 



The fact that Eq. (4.14) and Eq. (4.15) are different at second order should not be too surprising. In 
fact while Eq. (4.14) is a polynomial expansion around f n i = using polynomials in their natural form, 
Eq. (4.15) is an expansion in Hermite polynomials, which, for the problem at hand, are an orthogonal 
basis. It is well known that expansions in orthogonal polynomials have smaller truncation errors, 
much better convergence and extrapolation properties than other polynomial expansions. Therefore a 
truncated Edgeworth expansion should be preferred to a Taylor expansion truncated at the same order. 
Not unexpectedly, our numerical investigations have shown that the difference between Eq. (4.14) and 
Eq. (4.15), at least for relatively small /„;, is small. 

Note that the B05 [28] second-order expansion would correspond to 



->2ndBabich 



1 



/2tt 



exp 



_I$ 2 
2 



/ ni (* 3 -3$) 



f 2 

Jnl 



-5$ 4 



14$^ - 5 



(4.16) 



One "hybrid" approach is to look for simplicity of integration, first expand to second order both 
the square root and the argument of the exponential, then leave in the argument of the exponential 
only terms oc 4> 2 and oc f n i<j> the rest expand as exp(x) = 1 + x + x 2 . This is shown in Fig. (3). 

Having different approximations used in the literature and the exact expression allows us to 
quantify their performance. In table (4.2) we show the range of $/ct where the different approximations 
work at better than 20%. We conclude that a second-order expansion (Edgeworth or plain second 
order or hybrid) reproduces well the PDF except in the extreme tails ( > 5a) for relatively high 
f n i values and in addition offers the possibility of performing many of the relevant path integrals 
analytically. In what follows we will therefore use this approximation when needed. 
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/nlc 


1st order Babich(2005) 


2nd order 


second-order Edgeworth 


Hybrid 


0.1 


3 


4 


5 


4 


0.01 


4 


6 


6 


6 


0.001 


7 


10 


10 


8 



Table 2. Number of sigmas where the different approximations fit the exact PDF to better than 20%. Recall 
that for simplicity throughout this section we have taken the field to have unit variance. 

Note that in the form of B05 the log of the PDF to second order is written as a Gaussian part 
plus a first order term given by 1\ x / NL and a second order term I 2 x /^ L . = 6/nl and (7 2 ) = 6. 
If we write the second order expansion or the second order Edgeworth expansion in the B05 form we 
obtain the same 1\ expression but different expressions for Ii- Note however that (I2) is always equal 
to 6. 

While so far we have interpreted the PDF as a probability of $ for a given /nl, we can reinterpret 
it, assuming a uniform prior on /nl, as the PDF for /nl given $. Doing so, as we will show in details 
below, will uncover some properties of this PDF which we believe were missed so far. These stem from 
the fact that in 7 , ($|/nl), /nl is fixed and it is therefore physically motivated to chose it to be small. 
Conversely, when considering ^(/nlI^), the /nl values are not bounded and can in principle range 
from —00 to 00. For large |/nl| (or combinations of relatively small |/nl| but large |$|), T^/nl^) 
has discontinuities which prevent any expansion in /nl- It is only when working in a regime away 
from the discontinuities that expansions in /nl are meaningful and moments of the distribution are 
well defined so the the central limit theorem applies. 

4.3 Estimators vs. full PDF 

We take a detour to discuss the choice of an estimator for /nl given the exact and approximated 
expressions of the PDF in the simplified one-point case. 

It is worth recalling that the maximum likelihood estimator (MLE) is often desirable because, if 
it exists, under mild conditions, it is asymptotically the best unbiased estimator. To derive a MLE one 
interprets the PDF (probability of the data given the parameters) as the likelihood for the parameters 
given a set of data, and finds its maximum. This conditions give a combination of the data ($ in 
our case at hand) which is a MLE of /nl (or /„/) i.e. /nl (or /„;) . The drawback of a MLE is 
that in many problems the full PDF is not known or if it is known, its maximum cannot be found 
analytically. For our exact formulation of the PDF, it is clear that the maximum must be found 
numerically. In addition, in some regimes the PDF diverges making a MLE not well defined. Fig. (4) 
-solid line, top panel- shows the locus of the maximum of the likelihood as a function of <E> for the 
simple one-point PDF investigated above (i.e., in this sub-section we keep considering </> to have unit 
variance). The dotted line is for guidance at /nl = 0. The dashed line corresponds to the estimator 
of B05. In correspondence of the diamond symbols in the bottom panels we show the shape of the 
likelihood as a function of /„; for fixed $ values (0.1, 0.5, 1, 1.5). The likelihood is well-behaved as 
long as 4> is small. The location of the "cusp" in the curve corresponds to |<f>| = 1 and for $ = 1 + e 
(where e is a small number) the PDF diverges, the discriminant becomes negative (and the PDF is 
zero) splitting the divergence through the middle. For |$| > 1 there are two points where the PDF 
diverges, separated by a region where the discriminant is negative and the PDF is zero. 

Furthermore, the support of the PDF depends on the parameter, since it is zero for $ < —f n i — 
l/(4/„i), and this makes the Cramer- Rao bound invalid [30]. 

These conditions make the MLE not well defined. Note that the B05 estimator is not a MLE 
except for / n ; = 0. Its error is shown to satisfy the Cramer-Rao bound, but that just corresponds to 
an expansion of the PDF i.e. it approximates the PDF as a function of f n i as a Gaussian distribution. 
The plots in Fig. (2) show that this is not a good approximation: even when <& is very small the 
likelihood has broad wings. While it is true that the central limit theorem should make the PDF 
close to Gaussian for practical applications where many independent measurements are considered, 
the full PDF shape carries much more information than its mean and the variance, as discussed at 
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Figure 4. Top panel. Solid line: the numerically determined maximum likelihood as a function of /„; (= /nlc) 
for several values of 3>. Here the underlying Gaussian field (j> is taken to have unit variance (a = 1). For 
<J?| > 1 the PDF diverges making a maximum likelihood estimator not well defined. Dashed line: the Babich 
(2005) estimator. The dotted line is for reference at /„; = 0. The inset shows a zoom in in the region around 
fni = 0. The Bottom 4 panels show slices through the /„; likelihood for fixed values of $ (0.1, 0.5, 1, 1.5 
from top left to bottom right), which maximum correspond to the locations of the diamonds. Note that the 
likelihood for /„; is always highly non-Gaussian. For |$j > 1 there are two points where the PDF diverges that 
are separated by a region where PDF is zero. For |<3>| > 1 the derivatives of the likelihood are non continuous. 
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length in [31]. Since we have the exact expression for the PDF we can compute the possible bias of 
the estimator by computing (numerically) the expectation value of it. 

We numerically find that the estimator of B05 is unbiased even for large values of /„/. In 
particular the bias of the estimator scales like f*, as follows: 

bias of estimator = 0.001 

Given that the maximum likelihood estimator is not well defined and that the full likelihood is 
known, and that the PDF carries more information than just the mean and variance of an estimator, 
it is much more useful to work directly with the full PDF. This is what we set up to do here. 

The above discussion is valid for a single pixel. In any practical application many independent 
measurements will be combined, and a full analysis would include in-field correlations (e.g. [32]). For 
illustration, we consider first a full posterior analysis for /nl based on the complete PDF in n pixels. 

Using Bayes' theorem, the probability of the non-Gaussianity parameter having a value /„; is 
V(f n i\®) V(3>\f n i)> for a uniform prior on /„/. For i = l,...,n pixels, assumed independent, the 
probability of /„/ given a set of pixel values = $ (normalized so that the underlying Gaussian 
field <j) has unit variance for the case at hand) is 

V{fra\$) = nH/W|^) « HV($i\fnl) 

i i 

SO 

\nV(f nt \S) = ^ In ]/„,) + const. 

i 

If the true /„/ is /^ ue , then the probability of a pixel having a value between $ and $ + d& is 
7 5 ( < I > |/^ I / uc )(i$. So the sum over the set of pixels becomes an integral over with a weight given by 
■p^l/n™ ) (and a normalisation given by the number of pixels): 

lnV(f ni \$) = n J d*7>($|/£ UB )hi7>($|/ n ,) + canst. 

Note that the for the purpose of this illustration the number of pixels enters only as a normalization 
factor; in fact the sum is evaluated as an integral and the limits of integration are (formally) from — oo 
to oo. In practice, given a finite number of pixels, the chances of sampling the tails of the distribution 
are small and thus effectively the limits of integration should depend on the number of pixels. 

The above equation implies that p(f n i\&) is zero for /„/ > f^ ne , since there is an excluded region 
which depends on f nt : V($\f n i) = for $ < $ min where $ min = -f nl - l/(4/ ni ) (Note that this 
limit is very small for small /„/, e.g. for / n j = 0.01, $ > —25.01). Hence, if /„; > /^ ue , there is a 
region of <I> where "P($|/*™ c ) ^ and \nV(&\f n i) — > — oo and the integral diverges. This is borne 
out by doing the integration (see Fig. (5)) (Note that this argument applies to small /„;, where 4> m j n 
increases with f n i). This sharp discontinuity will also prevent any truncated expansion of the PDF 
being a good approximation. 

However, in a practical case, the chances of reaching the 'excluded' region which is forbidden 
for a small (positive) f n i are basically vanishingly small, so the cliff in the posterior is not going 
to happen in practice, because of sampling, or it could also happen because of noise. For example, 
if we introduce a small Gaussian noise into the measurements, then the probability of reaching the 
otherwise forbidden region of $ is no longer zero, and this allows values of /„; > /^ ue to have non-zero 
probability. Fig. (6), shows the posterior for local non-Gaussianity with a Gaussian noise added, with 
rms only 0.01; the zero probabilities have gone away, and the posterior becomes quite Gaussian. This 
behaviour is quite robust for small smoothing; for large smoothing the posterior gets wider. 

The cutoff disappears also if we now consider that for a sample of finite size the tails of the 
distribution are not well sampled. For example for /„; = 0.05(0.1), the region $ < — 5.05(— 2.6) is 
not sampled if the sample is smaller than about 1.7 x 10 6 (80). The disappearance of the cliff makes 
truncated expansions in /nl of the PDF viable. Note however that a truncation to first order will 
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Figure 5. Log(Posterior) for f n i(= /nlo"), with true /^; ue =0.1 and 10 6 pixels but ignoring sampling issues 
and noise. /„; > f^ uc is excluded, as the true PDF is non-zero for some values of $ where the trial PDF is 
zero. 
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Figure 6. Log(Posterior) for f n i(= /nlo") given 10 6 pixels, and /S uo = 0.03. A Gaussian noise of rms 0.01 
added (recall that the underlying Gaussian field cj> is normalized to have unit variance). The noise makes 
all values of "3? reachable in principle, and so the posterior is always non-zero. This removes the strict cutoff 
apparent in the previous figure. 



yield to a 'P 1s *(/nl) that does not have a maximum, which therefore cannot be a good approximation. 
At the minimum to have a maximum the approximated PDF must be truncated at the second order 
in /nl- 

We next explore the performance of the estimator, the exact PDF and its approximations in the 
regime where sampling is finite. To do so we generated simulations {realizations) of uni-variate non- 
Gaussian fields for given values of f n i (which we refer to as /*™ c ) and proceed to recover (measure) 
it from samples of the field (which we call f^, where m stands for measured). Before proceeding we 
verified that the distribution of the realization matches to analytic exact PDF for all /„/ values. This 
is shown in Fig. (7) for f% uc = 0.03. 

We concentrate here in two regimes: one where there are few samples (only 50 $s, small sample 
size) and one where there are many more samples (1500 ^s, large sample size). For simplicity we 
assume that the pixels (samples) are uncorrelated. Fig. (8) shows the distribution of the f™[ ob- 
tained using the exact PDF (solid line), the first and second order PDF (dashed and dotted, almost 
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Figure 7. Left panel: PDF simulated (3 million $'s, thick solid line), analytical exact expression (dashed 
line), first order (dotted) and second order(dot-dashed). Right panel: % difference compared with simulated 
PDF, analytical exact expression (solid), first order (dotted) and second order(dashed). Here the underlying 
Gaussian field (j> has unit variance and /^ ue = 0.03. 




Figure 8. Distribution of f™[ for the exact PDF (solid line), first order (dotted) second order (dashed), 
B05 estimator (Dot-dot-dot-dashed) in the small sample size limit. The thin vertical dotted line shown the 
location of ,f^; uc (0.03 on the left panel and —0.07 on the right panel). 

indistinguishable) and the B05 estimator (Dot-dot-dot-dashed). To be explicit, since the pixels are 
uncorrelated we can multiply the individual likelihoods P(/ n ;|$i...$ n ) = FJ- P(/ n ;|$,-) and is the 
value that maximizes such product. 

Clearly the distribution of f™ t is non-Gaussian. In particular we note that if /*™ c is small (0.03 
in this example) there is a non-negligible probability that the f n i of the sample is ~ 0.15 if the sample 
is small. The shape of the PDF for small sample size reflects the behavior seen in Fig. (4) bottom 
right panel. 

This behaviour disappears when the sample size is increased as shown in Fig. (9). Note that all 
approximations and the estimator have a variance that is larger than the one given by the exact PDF 
when /^™ e is large. As /*™ c — > the PDF, its approximation and estimator distributions converge. 
The resulting distribution remains however non-Gaussian. This demonstrates the importance of 
working with a PDF (exact if /^ ue is large, exact or approximated if f^ ue is small): the confidence 
intervals for ^> ler or ^> 68% levels, cannot be estimated from the rms. In the absence of a PDF, 
many simulations of the experiment would be required to estimate these confidence intervals via Monte 
Carlo simulation. On the other hand knowing the PDF allows one to compute any desired confidence 
interval. Although in this section we have worked in the case where M. is the identity matrix, we 
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0.03 



Figure 9. Distribution of f% for the exact PDF (solid line), first order (dotted) second order (dashed), B05 
estimator (Dot-dot-dot-dashed) in the large sample size limit. The thin vertical dotted line shows the location 
of fni ue - Note that all approximations overestimate the width of the true distribution for large /^ uc . As 
fni u ° — ► the PDF, its approximation and estimator distributions converge. 

expect that these findings hold qualitatively in the general case. 
4.4 If A4 is not the identity matrix 

If A4 is not the identity matrix but it is an invertible square matrix then we need to go back to the 
expression for V as: 



~\ E fo&jKoij ~ E ^"Koiv ~ ^ E ^"Kopv 



exp 



If n = M then one could do 



(4.18) 



(4.19) 



which also introduces a factor of 1/ det.A4 -1 = det A4 in front of the RHS. This complicates the 
Dirac delta function but leaves us with a Gaussian integral. 
We obtain 



?Wnl) = 



/ det K 
(2^F 



E^ ex p 



detM]J^ = 

a 1 + 4f NL (A a + C) 

lj2-fr(Aa)f q (A b )K 0ab 



ab 



(4.20) 



if we keep only one root for small / we get that r and q are = 1. 

For a possible practical application M is a square invertible matrix; for example when A corre- 
sponds to the matter overdensity field S be it in real or Fourier space. If n < M this route cannot be 
used (see below for the approach to use) but one could always write : 



P(AJnl)= V[A]V(A)S d (A-A). 



(4.21) 
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So for example for the one point PDF one should do: P(A X ) = J V[A]V{A)5 D {A x - A). By applying 
the Dirac delta one reduces the dimensionality of the path integral by one. The remaining integral 
might still have to be performed numerically. 

4.5 M. not invert ible, n < M 

To obtain directly the probability of P(A) if n < M and A4 is not invertible let us go back to Eq.4.18 
and as before let us now split the Dirac delta function so its argument is given by the elements of: 



A — N$' — m" =A-W- K.W' 2 - N(p" - /nlW 2 + / nl <tSc 



(4.22) 



where the elements of the matrix c are given by Ci = A4i a From now on let us define as c = /nlc^c. 

Now note that to impose the Dirac delta function one must first integrate in <f>' as such it is 
convenient first to change variables so that the argument of the Dirac delta function is multiplied by 
H -1 , which yields a | det K| in front of the equation and then it is useful to define : 



M<t>", /nl) - K-M - - /nlK- W 2 + N" 



with this the roots of the quadratic equation are: 



-1± Jl + 4/ NL ^(0",/ NL ) 

<t>r,i = 7T t 

giving that the Dirac delta function becomes: 

6 D (<l>i-<l>r=l)+6 D (</> i -<l>r=2) 

^l + 4/ NL i, 

giving: 



(4.23) 



(4.24) 



(4.25) 



P(A, /nl) = J det H J d<jy" ]J (jl + ifa^W' , fa)) exp -i £ ^K 0fw 



J2 exp 

r=l,2 



\ ij ^ J 



(4.26) 

Again one can safely ignore the exponentially suppressed root. Note that even if only an integral in 
<fi" is left, there are complications to perform this analytically: <j>" appears not only in front of the 
exponential as an argument of the function A (under the square root), but also in the exponential 
both under square root in the <j> r and explicitly. 

This equation is one of the main results of this paper as it can be interpreted as the probability of 
/nl given the observable vector A. The expression is exact but the path integral must be performed 
numerically for different /nl values. The integrand is close to a Gaussian so the numerical integration 
should converge easily but detailed implementation will depend on the particular application one is 
interested in. We will return on this in Sec. (5). 

4.6 Approximations 

In order to progress analytically approximations must be made. Sec. (4.2) and (4.3) indicate that a 
second-order (Edgeworth) expansion is a good approximation for the full PDF if /nl is small and 
the sample size is reasonably large. While we have quantified this numerically for the single pixel 
and Poisson, uncorrelated pixels case, we expect that qualitatively the same will hold in the more 
general, multi-variate case. While for the one-point PDF where Ai is the identity matrix it is simple 
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to expand directly the PDF, as done in Sec. (4.2) this becomes unnecessarily complicated if we want 
to work with Eq. (4.26). In fact the work leading up to Eq. (4.26) and the route we chose, are 
necessary if one wants to find a closed form for the exact PDF. If one instead wanted a priori to 
obtain an approximated expression, truncated at a given order, the simplest route to follow would 
have been different. In fact it is well know that a general expression for any (well behaved) PDF 
can be obtained from its generating functional by expanding the PDF in terms of all its (2nd and 
higher-order) correlations. 

This expansion can be written as (see e.g., [33] , the integrand of their Eq. 9, and [34]): 



, detK 1 / 2 
P(A\fa h ) = (2?r) „ /2 exp 



n tOO i 

[n]=i 3=1 



d 



dA r 



exp 



1 



(A,K,A) 



(4.27) 



where K denotes the inverse covariance matrix for A and [r<] is a compact notation to account for 
all relevant combinatorials for example for i — 3 the second sum runs over all triplets that can be 
combined out of the Ai...A n . Here £W denotes the connected correlation function (or corresponding 
polyspectra) of order i. This expression is exact, but it involves an infinite sum which must be 
truncated. In fact this expression simply says that any non-Gaussian distribution is specified by all 
its higher order correlations (only a Gaussian distribution is completely specified by its covariance). 
In other words Eqs. (4.26) and (4.27) are mathematically equivalent for local non-Gaussianity once 
the £ ra are the ones of the field A. For a practical application however, when the exact PDF is needed, 
Eq. (4.26) is preferable to Eq. (4.27) because of the infinite series in Eq. (4.27) which would require 
evaluation of an infinite sequence of correlation functions for A. Conversely, if for some applications 
an approximated expansion of the PDF is what is needed, it is overwhelmingly easier to work with 
Eq. (4.27), truncated at the appropriate order. 1 We expand to second order in /nl by first truncating 
the first sum to i = 4 - i.e., the trispectrum- and then expanding the exponential in Taylor series to 
second order. 

It is easy to see that if A = n = 1 and $ has unit variance, we recover Eq. (4.15). In doing 
so it is useful to keep in mind that for local non-Gaussianity = 6/nl and £^ 4 ) 
In the general case by performing the derivatives we obtain: 
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(4.28) 

where U run from 1 to n. Here we recognize that in this approximation the PDF is given by a suitable 
combination of bispectrum and trispectrum. We will return to this point below in the application to 
the CMB Sec. (5). The great advantage of this approach is that the resulting expression is valid for 
non-Gaussianity beyond the local type. While we have tested quantitatively the performance of this 



x As an exercise it is not too algebraically cumbersome to see that at first order in /nl Eq. (4.26) and Eq. (4.27) are 
truly equivalent. The univariate case for unit variance field was presented in Sec. (4.2). For the multi-variate case with a 
generic covariance, the most cumbersome part is to keep track of the various (symmetrized) permutations that go in £i 3 J 
for the local non-Gaussian case. Since in the simpler case analyzed in Sec. (4.2) and (4.3) we could compute the exact 
PDF, its first and second order expansions, and compare the performance of the exact PDF with its approximations, 
we conclude that a second-order expansion of Eq. (4.27) is suitable for many interesting practical applications (/nl in 
agreement with current constraints, sample size not too small). Here therefore we derive an explicit expression for the 
multi-variate non-Gaussian PDF to second-order using Eq. (4.27). 
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approximation for the local case, we expect that qualitatively it will hold also e.g., for equilateral, 
orthogonal or flattened type. Of course, Eq. (4.28) is strictly second-order in /nl for non-Gaussianity 
of the type where & NG ~ /nl</> * 4> (* denoting convolution or multiplication). Should other types 
of non-Gaussianity be considered, then the expression would not be necessarily second-order in the 
non-Gaussianity parameter. For example for the c/nl type where /nl = 0, the terms with £( 3 ) would 
drop out and the resulting expression would be hrst-order in <?nl- 

Given that in a practical application the A; are measured (given) quantities, the PDF can be 
interpreted as the probability distribution for /nl- Within a given model (like for example the local 
model) where the bispectrum and trispectrum coefficients are related (one being oc /nl and the other 
oc /nl) the PDF can be used to constrain the parameter (i.e. find the maximum likelihood and the 
confidence intervals). In a more model- independent way, one could imagine finding joint constraints on 
the coefficients of the bispectrum and trispectrum and then comparing those with theory predictions. 
In light of these considerations and the fact that for /nl values allowed by present data and for not 
too small sample sizes this expression gives and an extremely good approximation for the exact PDF, 
Eq. (4.28) represents one of the main results of this paper. For a CMB application of Eq. (4.28), see 
Sec. (5). 

4.7 Adding Gaussian (or near- Gaussian) noise 

As before (and using the same convention as of Sec. (3.4)) we can derive the result in two ways. 
The direct way is to recall that P(A) — P(A) * P(e) which will be further explored below (Sec. 5.2) 
in the context of measuring /jvl from a CMB map. In fact, although the approach is general, the 
implementation depends on the details of the concrete application. This approach might be best if 
the noise has complicated pixel-to-pixel correlations. The other way is to substitute the Dirac delta 
function by a Gaussian with width given by the noise covariance S e or by the noise distribution if it 
is non-Gaussian. Here we illustrate the approach in the Gaussian noise case. 

In this case our starting equation becomes Eq. (2.7) but where the product of the Dirac delta 
function is instead l/(27r)™/ 2 exp[— 1/2(A— K t , A— where K t denotes the inverse covariance 
of the noise and should due interpreted as matrix times a vector. Unfortunately the resulting 
path integral cannot be done exactly analytically but can be performed numerically. However even in 
the signal-only regime the last path integral has to be performed numerically (see discussion around 
Eq. (4.26)) thus correctly including the effect of noise does not represent an additional computational 
burden. 

On the other hand, if the noise contribution is sub-dominant compared to the signal (and yet 
non-negligible) a saddle point approximation performs very well: Eq. (4.26) then becomes the first 
term in a saddle point approximation. 

5 Application I: measuring /jy L from a CMB map 
5.1 Signal-only regime 

It is possible to consider Eq. (4.26) and interpret it as the probability of /nl given the observable 
vector A (which could be ag m or T etc.). This expression is exact. One would need to still do a 
path integral numerically but only for a grid of values for /nl to find the maximum likelihood and its 
confidence levels. The path integral can be computed using standards lattice QCD techniques. The 
integrand is close to a Gaussian so integration should converge easily. This makes possible to increase 
the size of the lattice, which sample cf>" in Eq. (4.26). In QCD applications the size of the lattice can 
be as large as billions, comparable with the size we estimate would be needed for a CMB application. 

This approach of using the PDF does not have the problem of the bispectrum that is that the 
number of triplets one can make grows much faster than the number of pixels in a map and therefore 
different bispectrum estimates are not independent (and thus the central limit theorem needs not 
to apply) [17]. One possible limitation is that the expression is exact only in the signal-dominated 
regime (i.e. no noise). This can be overcome as we discuss below. 
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5.2 Adding Gaussian noise 

As done in Sec. (3.4) consider that the observable field is A and is given by the superposition of 
two independent random processes, the signal A and the noise e, (Ai — A4 + £j). For the sum 
(superposition) of two independent random processes the probability distribution is the convolution 
of the two probabilities. Doing the convolution however might not be possible analytically. One thing 
to keep in mind is that if the noise is uncorrelated one need not do a iVpi x -dimensional convolution, 
but one can do iV p i x 1-dimcnsional convolutions. On the other hand, what we expect in the presence 
of noise is that the probability distribution of /nl is broadened by the noise: 

Pa(/nl|-4) = Pa(/nl|^)*P £ (/nl|£) (5.1) 

i.e. a convolution between the result of the signal-only regime with the distribution for the noise-only 
regime. Given that the noise has zero mean P e (/NL|e) must only depend on the noise covariance E e . 
The advantage of this approach is that one is left to do only a 1 dimensional convolution. 

While P c (/nl|£) has not been computed, if the noise is Gaussian or close to Gaussian, we know 
that it must be very well approximated by the distribution for the [29] -KSW- estimator applied to a 
purely Gaussian map (as it saturates the Cramer- Rao bound [31]). This distribution has been shown 
to be very well approximated by a Gaussian [19] -since for a large number of pixels the central limit 
theorem holds- with variance cr 2 ^ = l/(6<7 2 N p ix). 

Another equivalent approach is the one outlined in sec. (4.7). Depending on the nature of the 
noise one or the other will be best suited. 



5.3 Second-order PDF expansion 

Eq.(4.28) as it is can be interpreted as the PDF given the temperature pixels in a CMB map (i.e. 
A = T or A = AT/T, n the number of pixels, I the pixel index and w l the i-point correlation 
function). On the other hand it is useful to re- write it explicitly as a function of the spherical 
harmonic coefficients ai m also because it will make explicit the relation of our findings with previous 
work in the literature. In the following we denote by a the vector of a™ and by n the number of 
harmonic coefficients considered. We obtain: 



p , (detC- 1 ) 1 / 2 

P ( fl '/ NL ) = ( 27r )n/2 ^ 



i J2 (orCO [(C- 1 a)^HC- 1 a)T 2 2 (C- 1 a)^-3(C-X:iT( C ~ la ) 



+ 



all &s Tx\tA 



all im 



72 



III (j 

in 



Ah{c-X\T^ V )ZT^)T:^ 



(5.2) 

Note that in the second line of Eq. (5.2) we recognize the KSW estimator [29] and in fact Eq. 
(5) of [35]. In the next two lines we recognize Eq. (32), but more specifically Eq. (158), of [36]. 

The last three lines are "new" terms that arise from expanding the exponential to second order 
in /nl thus involving a term proportional to the bispectrum squared. 

We argue here that interpreting this as a likelihood for /nl enables one to combine optimally 
bispectrum and trispectrum measurements and obtain both best fit value and confidence intervals for 
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the non-Gaussianity parameter. As discussed above this expression is valid for non-Gaussianities more 
general than the local form as long as departures from Gaussianity are small as quantified in Sec. (4.3). 
This expression is strictly second order in /nl for the type of non-Gaussianity where § NG ~ /nl</> * 4> 
(* denoting convolution or multiplication) which includes local, equilateral, flattened and orthogonal 
shapes, for other types of non-Gaussianity, this expression will not necessarily be second order in 
the non-Gaussianity parameters. Within a given non-Gaussian model where the bispectrum and 
trispectrum amplitudes are specified by a single parameter, this PDF can be used to constrain such 
a parameter. Moreover, in a more model-independent approach one could use the above PDF to find 
joint constraints on the amplitude of the bispectrum and trispectrum for comparison with theory. 
Eq. (5.2) is one of the main results of this paper. 

6 (7nl case 

We can extend our formalism to the so-called <?nl model, in which 

$ - + 3nl(^ 3 - Z{4> 2 )4>) = 4> + .9ni> 3 , (6-1) 
which corresponds to the case when /nl is zero. If M. is the identity matrix, we obtain 



V(A) 



I dct K t 
(2 



x exp 



i 1 V L fat* 

- fri^friA^Koij - fr{Ai)<t> v K Qiv 



(6.2) 



where </> r is the real root of the third order equation A — (j> — <7nl0 3 = 0. The integral in (j> can be 
performed analytically as done before. If Ai is invertible but not the identity matrix, then A — > 
Mr 1 A and a factor of det A4 should be included in front of Eq. (6.3). In the general case where A4 
is not invertible, the technique presented in Sec. (4.5) can be similarly applied here to obtain a closed 
expression. 



7 Application II: Mass function of dark matter halos 

Note that in the case where we need the PDF of the linearly extrapolated 5 L , T'(5 L ), M. is invertible 
and we have obtained above an exact analytic expression: 

<i>,R J 

y VjJI' 

Here S L is the linearly extrapolated overdensity field smoothed on scale R, corresponding to the halo 
mass one is interested in; o\r denotes the RMS of 5 L (smoothed on scale R); cr<p,R is the corresponding 
quantity for the primordial field cf>. The sum is over the two roots (recall that the exponentially 
suppressed root can be safely neglected). 5 = M~ 1 S. Finally fi t 2{$) = 1/(2/nl) ( — 1 ± V 1 + 4/nl^)- 
V(S L ) is the basic ingredient to compute the mass function. In the old-fashioned Press-Schechter 
approach one only had to integrate the one point PDF for 6 L above the threshold for collapse S c 
(and introduce the fudge factor of 2). In this framework our approach gives an exact expression for 
the Press-Schechter mass function, which should not have the problem of negative PDF tails (as in 
the Edgeworth expansion, see [21, 37]) or being inaccurate for not so rare peaks [20]. The standard 
Press-Schechter approach fails in many respects but the one point PDF remains the key ingredients 
for more sophisticated formulations of the mass function [37-40]. In Ref. [41] showed the equivalence 
of simple barrier crossing to more modern and sophisticated approaches. Ref. [39] showed how to 
compute the (Gaussian and non-Gaussian) mass functions starting from the 1-point PDF and using 
a suitably-defined threshold. 



(2tt)M/2 Gr / 1+4f m +f j , r fr: 2 
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7.1 Application III: three-dimensional matter overdensity (e.g. 21cm surveys) 

In the case of surveys that map directly the three-dimensional 6 field, the size of the M. matrix 
is M — n as in Sec. (4.4), which simplifies the computations over the CMB case. Although one 
still needs to compute the transfer function because non-Gaussianity is in the cj> field, the equations 
simplify significantly, as the last path integral can be performed analytically, see Eq. (4.20). 

8 Conclusions 

We have presented a new formalism to compute the exact multi-variate joint probability distribution 
function (PDF) of a non-Gaussian field of the local type. We provide an exact formula for such PDF 
for the non-Gaussianity parameter /nl in Eq. (4.26). This exact formulation has several applications 
ranging from the halo mass function to the analysis of CMB maps. In its more generic formulation 
(and this for applications that rely on the generic formulation such as application to CMB maps) the 
final integration must be performed numerically. However in special cases (e.g., as the fundamental 
step for the derivation of the halo mass function) the resulting expression is fully analytic. We present 
the exact analytic formula for the 1-point non-Gaussian PDF in Eq. (4.12) which provides the key 
ingredient for an exact formulation of the mass function of dark halos; a formula to do so is provided 
in Eq. (7.1). 

These results serve as the basis for a fully Bayesian analysis of the non-Gaussianity parameter 
which has several advantages compared to the presently-used, state-of-the-art frequentists estimators. 
Most notably the fact that the PDF encloses the full information on the parameter of interest and 
therefore prohibitively expensive Monte Carlo estimation of the errors is not needed. While the 
analytic derivation is done in the signal-only regime, we also discuss how to include (analytically) the 
effect of instrumental noise. 

Having a full, exact PDF expression and formulating the problem in a Bayesian fashion, enabled 
us to uncover some features of the PDF for local non-Gaussianities that were previously ignored. 
Notably the fact that the PDF is not smooth nor continuous, making any approximation based 
on expansions on the /nl parameter of the PDF often unsuitable (and applicability of the central 
limit theorem limited). We however identify a regime where a second order expansion is a good 
approximation, this happens for a combination of /nl not too large (i.e., not ruled out by current 
measurements) and a sample size which is not too small (few tens or equivalently for a full sky CMB 
analysis (£ < £ max where £ max ~ few or larger). We present explicit expressions for this second order 
expansion approximation both in the general case, Eq. (4.28), and for the specific application to a 
CMB map, Eq. (5.2). In such expansions we can recognize previously known terms (such as the KSW 
estimator of Ref. [29], and the optimal trispectrum estimator of Ref. [36]) but also a new term oc /^ L . 
This formulation enables one to combine optimally bispectrum and trispectrum measurements and 
obtain both best-fit value and confidence intervals for the non-Gaussianity parameter. As an added 
bonus, this truncated expression is valid for non-Gaussianity beyond the local type. While we have 
tested quantitatively the performance and regime of validity of this approximation for the local case, 
we expect that qualitatively it will hold also e.g. for equilateral, orthogonal or flattened type and 
in general for non-Gaussianity where $> NG ~ /nl</> * <fi (* denoting convolution or multiplication). 
Within a given model (like for example the local model) where the bispectrum and trispectrum 
coefficients are related the PDF can be used to constrain the parameter (i.e. find the maximum 
likelihood and the confidence intervals) . In a more model- independent way, one could imagine finding 
joint constraints on the coefficients of the bispectrum and trispectrum and then comparing those with 
theory predictions. Our formulas will be useful to analyse upcoming experiments like Planck and 
large-scale galaxy surveys in searching for non-Gaussian features, compute the confidence intervals 
and in general do statistical inference also comparing and combining different data sets. 
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